-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add read-only Config endpoint #9497
Conversation
I see there are pagenization parameters: airflow/airflow/api_connexion/openapi/v1.yaml Lines 1136 to 1138 in d531cd6
But to me it seems werid to have pagenization for config. I prefer to remove it. WDYT? @mik-laj |
@zikun I agree. That's weird. Let's delete these parameters. |
@@ -1760,6 +1757,9 @@ components: | |||
value: | |||
type: string | |||
readOnly: true | |||
source: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they can't do anything about this information. They will not change their behavior because the information comes from environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inspired by the table in the web configuration page, which has four columns - section, key, value and source. Isn't source information useful for admin users to change and debug the configuration? Especially when it comes from multiple sources like airflow.cfg, env var, cmd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we will be able to maintain the backward compatibility of the API for this field. in my opinion, the value of this field in the API is low because it refers to values that the API client cannot influence in any way. This may allow debugging problems, but the main goal of the API is to facilitate the management, but not to facilitate troubleshooting.
A similar situation is with the Job table, which is not present in API, and access to it allows us to solve troubleshooting issues, but this table is not relevant for third-party systems and has not been included in the API specification. Each field/endpoint in the API is opt-in, not opt-out, to facilitate backward compatibility.
If you want to make field decisions, think about whether this field will be relevant when you have 100 Airflow instances., In this case, you need a different view of the data stored in the system. You may worry about what the value of the configuration option looks like, e.g. to compare instances, but the source of the content is technical detail.
We can add additional endpoints that allow access to more detailed data in the future, but these endpoints will have to be specially marked to ensure level of stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are coming from. I think I am not clear on the main use case of this endpoint. Do you mind giving a specific example on what this endpoint might be used for? Like what do people do after they query GET /config
from 100 Airflow instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Airflow has options that have a big impact on instance performance and resource usage.
parallelism = 32
dag_concurrency = 16
max_active_runs_per_dag = 16
dag_file_processor_timeout = 50
scheduler_heartbeat_sec = 5
job_heartbeat_sec = 5
processor_poll_interval = 1
min_file_process_interval = 0
dag_dir_list_interval = 300
etc.
Users may want to read these values and then combine them with data from other applications (e.g. Stackdriver, Zabbix, Prometheus) e..g. average CPU usage, average memory usage, etc. This will allow us to make recommendations on the changes that should be made to improve the health of the instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I removed it
Hi @mik-laj I need some help for the unit test. Because the original |
@zikun I'm starting to look at it |
I think we need to give up one response format. |
@zikun Here is an example of testing using mock |
This is weird because Dag Source and Log uses different types of responses and it probably works there. |
Thanks a lot for the example. It did not work because I was mocking
I just tried testing with both json and text/plain response types. The json test failed. I'm looking into the dag source and log PRs now to find differences that lead to the failure. |
c18722d
to
344943b
Compare
I fixed the json test. Now it works for both text/plain and json. Now there's only one pylint test failing
I converted unittest to pytest as I remember there was a discussion to move away from unittest to pytest. |
All checks passed @mik-laj |
I finished work today. Please ping me tomorrow. |
config_text = '\n'.join( | ||
f'[{config_section.name}]\n' + | ||
''.join(f'{config_option.key} = {config_option.value} # source: {config_option.source}\n' | ||
for config_option in config_section.options) | ||
for config_section in config.sections | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you say to use some helper methods like:
def _make_single_record(config_option):
return f'{config_option.key} = {config_option.value} # source: {config_option.source}\n'
def _make_single_section(config_section):
return f'[{config_section.name}]\n{_make_single_record(o) for o in config_section.options}'
def _config_to_plain_text(config):
return '\n'.join(_make_single_section(s) for s in config.sections)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the benefits of such gradation.
I think we can split it in a different way.
text_serializer = {
''text/plain'': func1 ,
''text/plain'': func2 ,
}
conf_dict = conf.as_dict()
config = conf_dict_to_config(conf_dict)
return_type = request.accept_mimetypes.best_match(response_types)
if return_type not in serializer:
return Response(status=406)
config_text = text_serializer[return_type]
return Response(config_text, headers={'Content-Type': return_type})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think both of your suggestions are good. We can combine them.
It's good to break into smaller functions especially as they handle different scopes, just like having nested classes for ConfigSchema
. One benefit I can think of is in case we want to offer smaller endpoints like /config/{section}/{option}
, we can easily make use of those small functions.
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
e2e0f35
to
045fbbb
Compare
045fbbb
to
b9dcc7d
Compare
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com> Co-authored-by: Tomek Urbaszek <turbaszek@apache.org> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Closes #8136
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.